Numpy is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays. Note that the terms array, NumPy array, or ndarray all refer to the same thing: the ndarray object.
In [ ]:
import math
import numpy as np
''' Display Precicion Settings '''
np.set_printoptions(formatter={'float': '{: 0.4f}'.format})
In [ ]:
data1 = [6, 7.5, 8, 0, 1]
arr1 = np.array(data1)
arr1
In [ ]:
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2 = np.array(data2)
arr2
In [ ]:
arr2.ndim
In [ ]:
arr2.shape
Unless explicitly specified np.array tries to infer a good data type for the array that it creates. The data type is stored in a special dtype object:
In [ ]:
arr1.dtype
In [ ]:
arr2.dtype
In addition to np.array, there are a number of other functions for creating new arrays:.
Method | Description |
---|---|
array | Convert input data (list, tuple, array, or other sequence type) to an ndarray either by inferring a dtype or explicitly specifying a dtype. Copies the input data by default. |
asarray | Convert input to ndarray, but do not copy if the input is already an ndarray. |
arange | Like the built-in range but returns an ndarray instead of a list. |
ones, ones_like | Produce an array of all 1’s with the given shape and dtype. ones_like takes another array and produces a ones array of the same shape and dtype. |
zeros, zeros_like | Like ones and ones_like but producing arrays of 0’s instead. |
empty, empty_like | Create new arrays by allocating new memory, but do not populate with any values like ones and zeros. |
eye, identity | Create a square N x N identity matrix (1’s on the diagonal and 0’s elsewhere). |
To create a higher dimensional array with these methods, pass a tuple for the shape.
In [ ]:
np.zeros(10)
In [ ]:
np.zeros((3,6))
In [ ]:
np.ones((2,2))
In [ ]:
np.eye(3,3)
The identity alternative to eye takes only one value:
In [ ]:
np.identity(2)
empty creates an array without initializing its values to any particular value. It is not safe to assume that np.empty will return an array of all zeros:
In [ ]:
np.empty((2, 3, 2))
In many cases it will return uninitialized garbage values.
arange is an array-valued version of the built-in Python range function:
In [ ]:
np.arange(15)
The data type or dtype is a special object containing the information the ndarray needs to interpret a chunk of memory as a particular type of data:
In [ ]:
arr1 = np.array([1, 2, 3], dtype=np.float64)
arr1.dtype
In [ ]:
arr2 = np.array([1, 2, 3], dtype=np.int32)
arr2.dtype
Dtypes are part of what make NumPy so powerful and flexible.
It is often only necessary to care about the general kind of data you are dealing with, whether floating point, complex, integer, boolean, string, or general Python object.
When you need more control over how data are stored in memory and on disk, especially large data sets, it is good to know that you have control over the storage type.
Table: NumPy data types
Type | Type Code | Description |
---|---|---|
int8, uint8 | i1, u1 | Signed and unsigned 8-bit (1 byte) integer types |
int16, uint16 | i2, u2 | Signed and unsigned 16-bit integer types |
int32, uint32 | i4, u4 | Signed and unsigned 32-bit integer types |
int64, uint64 | i8, u8 | Signed and unsigned 32-bit integer types |
float16 | f2 | Half-precision floating point |
float32 | f4 or f | Standard single-precision floating point. Compatible with C float |
float64 | f8 or d | Standard double-precision floating point. Compatible with C double and Python float object |
float128 | f16 or g | Extended-precision floating point |
complex64, complex128, complex256 | c8, c16, c32 | Complex numbers represented by two 32, 64, or 128 floats, respectively |
bool | ? | Boolean type storing True and False values |
object | O | Python object type |
string_ | S | Fixed-length string type (1 byte per character). For example, to create a string dtype with length 10, use 'S10'. |
unicode_ | U | Fixed-length unicode type (number of bytes platform specific). Same specification semantics as string_ (e.g. 'U10'). |
In most cases they map directly onto an underlying machine representation, which makes it easy to read and write binary streams of data to disk and also to connect to code written in a low-level language like C or Fortran.
The numerical dtypes are named the same way: a type name, like float or int, followed by a number indicating the number of bits per element. A standard double-precision floating point value (what’s used under the hood in Python’s float object) takes up 8 bytes or 64 bits. Thus, this type is known in NumPy as float64.
You can explicitly convert or cast an array from one dtype to another using ndarray’s astype method:
In [ ]:
arr = np.array([1, 2, 3, 4, 5])
arr.dtype
In [ ]:
float_arr = arr.astype(np.float64)
float_arr.dtype
In [ ]:
float_arr
In this example, integers were cast to floating point. If I cast some floating point numbers to be of integer dtype, the decimal part will be truncated:
In [ ]:
arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
arr
In [ ]:
arr.astype(np.int32)
Should you have an array of strings representing numbers, you can use astype to convert them to numeric form:
In [ ]:
numeric_strings = np.array(['1.25', '-9.6', '42'], dtype=np.string_)
numeric_strings.astype(float)
Note how NumPy magic mapped the lazy input of a Python float type (instead of np.float64 say) to the equivalent dtype.
If casting were to fail for some reason (like a string that cannot be converted to float64), a TypeError will be raised.
Another array's dtype can also be used:
In [ ]:
int_array = np.arange(10)
int_array
In [ ]:
calibers = np.array([.22, .270, .357, .380, .44, .50], dtype=np.float64)
int_array.astype(calibers.dtype)
There are shorthand type code strings for refering to a dtype (see table above):
In [ ]:
empty_uint32 = np.empty(8, dtype='u4')
empty_uint32
Note:
– Calling astype always creates a new array (a copy of the data), even if the new dtype is the same as the old dtype.
– Keep in mind that floating point numbers, such as those in float64 and float32 arrays, are only capable of approximating fractional quantities. In complex computations, floating point errors may accrue, making comparisons only valid up to a certain number of decimal places.
Arrays are important because they enable you to express batch operations on data without writing any for loops. This is usually called vectorization. Any arithmetic operations between equal-size arrays applies the operation elementwise:
In [ ]:
arr = np.array([[1., 2., 3.], [4., 5., 6.]])
arr
In [ ]:
arr * arr
In [ ]:
arr - arr
Arithmetic operations with scalars are as you would expect, propagating the value to each element:
In [ ]:
1/arr
In [ ]:
arr ** (.5)
Operations between differently sized arrays is called broadcasting and is not covered here.
NumPy has rich array indexing possibilities is a rich topic as there are many ways to select a subset of array data or individual elements. One-dimensional arrays are simple; on the surface they act similarly to Python lists:
In [ ]:
arr = np.arange(10)
arr
In [ ]:
arr[5]
In [ ]:
arr[5:8]
An important first distinction from lists is that array slices are views on the original array; that is, the data is not copied, and any modifications to the view will be reflected in the source array, e.g. if a scalar value is assigned to a the value is propagated (broadcasted) to the entire selection:
In [ ]:
arr[5:8] = 12
arr
In [ ]:
arr_slice = arr[5:8]
arr_slice[1] = 12345
arr
In [ ]:
arr_slice[:] = 64
arr
This is may seem surprising, many other programming languages will copy data more zealously. But NumPy has been designed with large data use cases in mind, and this feature gives a wholly different performance, avoiding memory problems if NumPy had insisted on copying data left and right.
If you want a copy of a slice of an ndarray instead of a view, you will need to explicitly copy the array:
In [ ]:
arr[5:8].copy()
With higher dimensional arrays, you have many more options. In a two-dimensional array, the elements at each index are no longer scalars but rather one-dimensional arrays:
In [ ]:
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2d[2]
Thus, individual elements can be accessed recursively. But that is a bit too much work, so you can pass a comma-separated list of indices to select individual elements. So these are equivalent:
In [ ]:
arr2d[0][2]
In [ ]:
arr2d[0, 2]
In multidimensional arrays, if you omit later indices, the returned object will be a lower-dimensional ndarray consisting of all the data along the higher dimensions:
In [ ]:
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
arr3d
Both scalar values and arrays can be assigned to arr3d[0]:
In [ ]:
old_values = arr3d[0].copy()
arr3d[0].copy()
In [ ]:
arr3d[0] = 42
arr3d
In [ ]:
arr3d[0] = old_values
arr3d
Similarly, arr3d[1, 0] gives you all of the values whose indices start with (1, 0), forming a 1-dimensional array:
In [ ]:
arr3d[1, 0]
Like one-dimensional objects such as Python lists, ndarrays can be sliced using the familiar syntax:
In [ ]:
arr[1:6]
Higher dimensional objects give you more options as you can slice one or more axes and also mix integers. Consider the 2D array above, arr2d. Slicing this array is a bit different:
In [ ]:
arr2d
In [ ]:
arr2d[:2]
In [ ]:
arr2d[:,2]
In [ ]:
arr2d[:2, 1:]
In [ ]:
arr2d[1, :2]
In [ ]:
arr2d[2, :1]
In [ ]:
arr2d[:, :1]
And again, assigning to a slice expression assigns to the whole selection:
In [ ]:
arr2d[:2, 1:]
In [ ]:
arr2d[:2, 1:] = 0
arr2d[:2, 1:]
In [ ]:
arr2d
In [ ]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
names
In [ ]:
data = np.random.randn(7, 4)
data
(The randn function in numpy.random generates normally distributed random data.)
Suppose we wanted to select all the rows that correspond to the name 'Bob'.
Like arithmetic operations, comparisons (such as ==) with arrays are also vectorized. Thus, comparing names with the string 'Bob' yields a boolean array, which can be passed when indexing the array:
In [ ]:
names == 'Bob'
In [ ]:
data[names == 'Bob']
The boolean array must be of the same length as the axis it is indexing.
You can even mix and match boolean arrays with slices or integers (or sequences of integers, more on this later):
In [ ]:
data[names == 'Bob', 2:]
In [ ]:
data[names == 'Bob', 3]
In [ ]:
data[names != 'Bob']
Selecting two of the three names to combine multiple boolean conditions, use boolean arithmetic operators like & (and) and | (or):
In [ ]:
mask = (names == 'Bob') | (names == 'Will')
mask
In [ ]:
data[mask]
Selecting data from an array by boolean indexing always creates a copy of the data, even if the returned array is unchanged.
Note: The Python keywords and and or do not work with boolean arrays.
Setting values with boolean arrays works in a common-sense way. To set all of the negative values in data to 0 we need only do:
In [ ]:
data[data < 0] = 0
data
Setting whole rows or columns using a 1D boolean array:
In [ ]:
data[names != 'Joe'] = 7
data
Fancy indexing is a term adopted by NumPy to describe indexing using integer arrays.
In [ ]:
arr = np.empty((8, 4))
for i in range(8):
arr[i] = i
arr
To select out a subset of the rows in a particular order, simply pass a list:
In [ ]:
arr[[4, 3, 0, 6]]
In [ ]:
arr[[-3, -5, -7]]
Passing multiple index arrays does something slightly different; it selects a 1D array of elements corresponding to each tuple of indices:
In [ ]:
arr = np.arange(32).reshape((8, 4))
arr
In [ ]:
arr[[1, 5, 7, 2], [0, 3, 1, 2]]
That is, it works like a coordinate system for picking elements: the elements $(1, 0)$, $(5, 3)$, $(7, 1)$, and $(2, 2)$ were selected.
Instead, one way to obtain the rectangular region formed by selecting a subset of the matrix’s rows and columns is:
In [ ]:
arr[[1, 5, 7, 2]][:, [0, 3, 1, 2]]
Another way is to use the np.ix_ function, which converts two 1D integer arrays to an indexer that selects the square region:
In [ ]:
arr[np.ix_([1, 5, 7, 2], [0, 3, 1, 2])]
Fancy indexing, unlike slicing, always copies the data into a new array.
Arrays have the transpose method and also the special T attribute:
In [ ]:
arr = np.arange(15).reshape((3, 5))
arr, arr.T
To compute the inner matrix product $X^T X$:
In [ ]:
arr = np.random.randn(6, 3)
np.dot(arr.T, arr)
For higher dimensional arrays, transpose will accept a tuple of axis numbers to permute the axes:
In [ ]:
arr = np.arange(16).reshape((2, 2, 4))
arr
In [ ]:
arr.transpose((1, 0, 2))
Simple transposing with .T is just a special case of swapping axes. ndarray has the method swapaxes which takes a pair of axis numbers:
In [ ]:
arr
In [ ]:
arr.swapaxes(1, 2)
swapaxes similarly returns a view on the data without making a copy.
A universal function, or ufunc, is a function that performs elementwise operations on data in ndarrays, like a fast vectorized wrappers for basic functions that take one or more scalar values and produce one or more scalar results.
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
m = np.eye(3,3)
print(m)
In [ ]:
m[m>0]=4
print(m)
In [ ]:
old = np.array([[1, 1, 1],
[1, 1, 1]])
new = old
new[0, :2] = 0
print(old)
In [ ]:
a = np.array([[2,3,4], [1,2,3]])
b = a * 1.0
c = a[1][1] * 1.0
c
In [ ]:
cvalue = [25.4, 24.8, 26.9, 23.9]
C = np.array(cvalue)
C
In [ ]:
(C*9/5 + 32)[0]
In [ ]:
round((C*9/5 + 32)[0],-1)
In [ ]:
np.array([1,4,9,15])//5.0
In [ ]:
np.arange(0, 10, 0.5, dtype = None)
In [ ]:
L, S = np.linspace(0, 50, num = math.pi, retstep=True)
S
In [ ]:
L
In [ ]:
x = np.array([[42,1,2],[1,2,3]])
x
In [ ]:
x[1][1]
In [ ]:
x%3
In [ ]:
x.ndim
In [ ]:
y = np.array([0., 1, 2, 3, 4., 5., 8, 13])
In [ ]:
print(y / 1.0)
print(y[1:4]*math.pi)
print(x[0, 1:])
In [ ]:
x = np.array([range(0, 51, 5),[x for x in range(0,255,25)],[True,True,True,True,True,False,False,False,False,False]])
s = x[0][:6]
In [ ]:
x[0][:6]
In [ ]:
s == x[0][:6]
In [ ]:
t = np.zeros((9))
t
In [ ]:
t = t.reshape(3,3)
t
In [ ]:
x = np.ones((3,3))
y = np.ones_like(x)
z = np.identity(5, dtype=float)
In [ ]:
x
In [ ]:
y
In [ ]:
z
In [ ]:
q = (np.eye(3,k=-1, dtype=int) + np.eye(3,k=+1, dtype=int)) * 4
q
In [ ]:
define = (3,3)
define
In [ ]:
rho = np.random.random(define)
rho
In [ ]:
print('The required random value is', "%.3f" % rho[2][0], 'to 3 d.p.')
In [ ]:
np.empty(define)
In [ ]:
print(z)
print(np.average(z))
print(np.median(z) < np.average(z))
print(np.std(z[0]))
print(np.max(z) * math.e)
In [ ]:
g = np.dot(q, z[:3,:3]) / 2.5
print(g)
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0.1, 2*np.pi, 10)
markerline, stemlines, baseline = plt.stem(x, np.cos(x), '-.')
plt.setp(baseline, 'color', 'r', 'linewidth', 2)
plt.show()
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]: